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Abstract. We study the problem of parameter estimation for stochastic differential equations 
with small noise and fast oscillating parameters. Depending on how fast the intensity of the noise 
goes to zero relative to the homogenization parameter, we consider three different regimes. For 
^ ■ each regime, we construct the maximum likelihood estimator and we study its consistency and 

■ asymptotic normality properties. A simulation study for the first order Langevin equation with a 
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two scale potential is also provided. 
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' 1. Introduction 

Data obtained from a physical system sometimes possess many characteristic length and time 
- scales. In such cases, it is desirable to construct models that are effective for large-scale structures, 

' whilst capturing small scales at the same time. Modeling this type of data via diffusion type models 

. may be well-suited in many cases. Thus, multiscale diffusion models have been used to describe 

the behavior of physical phenomena in scientific areas such as chemistry and biology [6l \TE[ [2H [26] , 
I ocean-atmosphere sciences [1^, finance and econometrics (HE]. In many of these problems, the 

noise is taken to be small because one may, for example, be interested in modeling (a): rare 
transition events between equilibrium states of a rough energy landscape I18[ [25]. or (b): short 
$H ■ time maturity asymptotics for fast mean reverting stochastic volatility models 111) . See also 

\1'6\ [TU] for a thorough discussion on different mathematical and statistical modeling aspects of 
perturbations of dynamical systems by small noise. 

Parameter estimation in multiscale models with small noise is a problem of great practical im- 
portance, due to their wide range of applications, but also of great difficulty, due to the different 
separating scales. The goal of this paper is to develop a theoretical framework for the estimation 
of unknown parameters in a multiscale diffusion model with vanishing noise. More specifically, 
let T > be given and consider the d-dimensional process X'' = {X|,0 < t < T} satisfying the 
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stochastic differential equation (SDE) 



(1.1) dX^ 



% ( XI ^) + eg (^Xt, ^) dt + V'^a (^Xt, ^) dWt, X'o = ^0, 



6 

where 6 = 6(e) J, as e J, 0, ^ G C is an unknown parameter and Wt is a standard d- 
dimensional Wiener process. The functions bg{x, y), cg{x, y) and a{x^ y) are assumed to be smooth, 
in the sense of Condition 12.11 periodic with period A in every direction with respect to the 
second variable. 

The rate of convergence of 6 and e to zero determines the type of equation that one obtains 
in the limit. For example, if 5 is of order 1 as e goes to zero, then equation (jl.ip reduces to a 
deterministic ODE that we obtain if we set e equal to zero. On the other hand, if e is of order 1 
as 5 goes to zero, then homogenization occurs and this results to an equation with homogenized 
coefficients. When both parameters e and 5 go to zero together, then we need to consider three 
different regimes depending on how fast e goes to zero relative to 5: 



(1.2) lim| 



OO Regime 1, 

7 G (0, oo) Regime 2, 
Regime 3. 

We mention here that asymptotic problems for models like (|1.1|) have a long history in the 
mathematical literature. We refer the interested reader to classical manuscripts such as [U [T3l [2l] 
for averaging and homogenization results and to the more recent articles [3 [12] for large deviations 
results and [H [9] for importance sampling results on related rare event estimation problems. 

In (jl.ip we assume that the drift term, through the functions hg and cg, depends on a physical 
parameter 9. Generally, from a statistical inference point of view, the main questions of interest 
are the following: 

(i) How can one estimate the fast oscillating parameter 5 and the intensity of the noise e? 

(ii) How can one estimate the unknown parameter 91 

The first question is undoubtedly a quite difficult one and is not addressed in the current work; 
see [23] for some related results for specific equations and further references. Instead, we focus on 
the second question. Thus, assuming that the regime of interaction between e and 6 is known, we 
want to estimate the unknown parameter 9 at time T, based on the continuously observed process 
up to this time. 

In order to do so, we will follow the maximum likelihood method. Maximum likelihood estimation 
in multiscale diffusions with noise of order 0(1) has been studied by different authors and under 
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different settings, see for example [21 [3l [TU [2T1 [22]. We also refer the reader to the manuscripts 
[5l [20l [25] for general results on statistical estimation for diffusion processes. The novelty of the 
present paper stems from the fact that we address the problem of parameter estimation when both 
multiscale effects and small noise are present, for all three regimes in ()1.2p . which requires a different 
approach for the construction of maximum likelihood estimators. 

Indeed, in |2H 122] . assuming that the noise is of order 0(1), the authors fit the data from the 
prelimit process to the log-likelihood function of the limiting process, i.e., of the process to which 
^£=1,5 converges to, as (5 J, 0. However, when the diffusion coefficient vanishes in the limit, the 
limiting process is no longer the solution of an SDE, but of an ODE (see Theorem 1 2. 6 1) . thus it is 
deterministic and does not have a well defined likelihood. Therefore, instead of working with the 
likelihood function of the limiting process, we work with the log-likelihood of the original multiscale 
model and we infer consistency and asymptotic normality (under conditions as described below) 
by studying its limit. 

In particular, under Regime 1 with 6 = and under Regimes 2 and 3 (see ()1.2p ). we prove that 
the maximum likelihood estimator (MLE) is consistent and asymptotically normal under broad 
conditions. The situation of Regime 1 with 6 7^ is more complicated, because the original log- 
likelihood function does not have a well defined limit as e 4, 0, due to the e/6 '[ 00 terms. We address 
this issue by introducing a modified (pseudo) log-likelihood which is well defined in the limit. It 
turns out that the resulting pseudo MLE is asymptotically biased, however its bias can be computed 
exactly. This is a known problem in multiscale parameter estimation problems [H [21 [3l \TE[ [21] [22] ; 
see Section [3] for some more details on this. 

Under Regime 1 with 6 7^ 0, we support our findings with a simulation study for a small noise 
diffusion in a two-scale potential field, a model of interest in the physical chemistry literature, 
[Si [TBI [25 [26]. For this particular model, we can construct, based on the biased pseudo MLE, an 
estimator that is asymptotically unbiased and normal. 

The rest of the paper is organized as follows. In Section [21 we establish the necessary notation 

and we present the main ingredients and assumptions needed in the sequel. In Section [3] we discuss 

the maximum likelihood estimation problem for all three regimes. For Regimes 2 and 3 and Regime 

1 when 6 = 0, we prove the consistency of the MLE, studying the limit of the log-likelihood function, 

in Section [H whereas we prove a central limit theorem for the MLE in Section [5l Finally, in Section 

[6] we study a particularly interesting case for Regime 1, when 6 7^ 0; a small noise diffusion in a 

two-scale potential field, we prove a central limit theorem for the pseudo MLE in this particular 

setup and we present a simulated study illustrating the theoretical findings. 
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2. Preliminaries, notation and assumptions 

We work with the canonical filtered probability space (O,^,!?^) equipped with a filtration 
that satisfies the usual conditions, namely, is right continuous and contains all Pg-negligible 
sets. 

Regarding the SDE (jl.ip we impose the following condition. 

Condition 2.1. (i) The parameter 9 & Q C MP where is compact. Also, the coefficients 
bg{x,y),cg{x,y) are Lipschitz continuous in 9. 

(ii) The functions bg{x,y), cg{x,y), a{x,y) are Lipschitz continuous and bounded in both vari- 
ables. Moreover, they are periodic with period A in the second variable in each direction. 
In the case of Regime 1 we additionally assume that they are C^(M.'^) in y and C^(M'^) in 
X with all partial derivatives continuous and globally bounded in x and y. 

(iii) The diffusion matrix cra"^ is uniformly nondegenerate. 

For notational convenience we define the operator • : •, where for two matrices A = [aij],B = [bij] 

A : B = y^^ajjbjj. 

Under Regime 1, we also impose the following condition. 

Condition 2.2. Consider the second order elliptic partial differential operator 

^l,e = be{x, y)-Vy + ^cr{x, y)a{x, yf : VyVy 

equipped with periodic boundary conditions in y (x is being treated as a parameter here). Let 
li]j{dy;x) be the unique invariant measure corresponding to the operator C\ g. Under Regime 1, we 
assume the standard centering condition (see for the drift term b: 

bg{x,y)fil{dy;x) = 0, for all 9 €Q, 

where y = T'^ denotes the d-dimensional torus. 

Under Conditions 12.11 and 12.21 Theorem 3.3.4 in [1] guarantees that for each i £ {1, . . . ,d} there 
is a unique, twice differentiable function xdx,y) that is one periodic in every direction in y and 
which solves the following cell problem: 

(2-1) Clgxe,e{x,y) = -biAx,y), / xe,e{x,y)f4{dy]x) = 0. 

Jy 

We write xe = {xi,e, ■ ■ ■ , Xd,e)- 

Under Regime 3, we also impose the following condition. 
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Condition 2.3. Under Regime 3 and for any 9 £ Q and x G W^, we assume that the ordinary 
differential equation 



has a unique invariant measure that is Lipschitz continuous in {9,x) G x R*^. 

Notice that the existence of a unique smooth invariant measure is immediately imphed for 
Regimes 1 and 2 due to Condition 12. li However, the situation is more comphcated for Regime 
3, since the operator of interest is a first order operator, where clearly the non-degeneracy condi- 
tion does not hold. For example Condition [2]3] certainly holds in dimension d = 1, when cei^x, y) > 
everywhere and it is sufficiently smooth. 

Before stating the main results, we need additional notation and definitions. We borrow some 
notation from [7] and modify it to fit our needs. 

Definition 2.4. For the three possible Regimes i = 1,2,3 defined in hl.^) and for x G M'^,?/ G y, 
let 



For i = l,2we let V{Cig) = C^{y) and for i = 3, V{CIq) = C\y). 

We also define for Regime i a function Xi{x, y), i = 1,2, 3, as follows. 

Definition 2.5. For the three possible Regimes i = 1,2,3 defined in and for x G M.'^,y G y, 

define Xi{x, y) : R'^ x y ^ R'^ by 



where xe = {xi,e, ■ ■ ■ ,Xd,e) is defined by h2. 1\) and I is the identity matrix. 

Based on the results in [7], we obtain the following theorem, which essentially is the law of large 
numbers for (jl.ip . Given the results in [7], the additional steps required in order to prove this 
theorem are minimal, so we include a short proof in this section as well. 
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(2.2) 



zt = ce{x,zt) 




y) = + -^(^' y)j (^e{x, y) 
X2,e{^, y) = jbeix, y) + ce{x, y). 



Theorem 2.6. Consider any xq G and any T > 0. Assume Condition \2.1\ In addition, in 
Regime 1 assume Condition \2.2\ and under Regime 3 assume Condition \2.3l Then, for all 6 & Q 
and 7] > and for Regime i = 1,2, 3, we have 



(2.3) 



limPe 



sup \X't-Xl\ > r] 

0<t<T 



0, 



where for Regime i, X* is the unique solution to the deterministic equation 



(2.4) 



Jy 



hei^i,y)4idy;Xi)ds 



and iJ,l{dy;x) is the invariant measure corresponding to the operator C]. from Definition \2.4\ 

Proof Under our assumptions, Theorem 2.8 in [7| guarantees weak convergence of X!" to in 
C([0,T]) for any T > 0. Since, the hmiting process XI is deterministic and weak convergence to 
constants imphes convergence in probabiUty, we obtain the claim of the theorem. Also, due to 
our assumptions, the limiting ODE's in (|2.4|) are well defined and have a unique solution in their 
corresponding regime. □ 



3. Maximum likelihood estimation 

Assume that we observe the process X'' in continuous time and denote by Xt = {xt, < t < T} 
the data we obtain. The log-likelihood function for estimating the parameter 9 in the statistical 
model (jl.ip can be expressed as follows 



T 



-bg + Cg,dx, 




6 



1 



-^be + eg 





^ I ds. 



(3.1) Zl^iXr) 

JO 

where we denote a{x,y) = aa'^{x,y) and for any positive definite matrix K 

(3.2) {p,q)K = [k-^'''p,K-^/\) and \\p\\l = {p,p) ^ 

Sometimes, we will omit the subscript K if K = I. Essentially, we define the likelihood function 
as the Radon-Nikodym derivative 

where ¥g is the measure for (jl.ip and the measure for (jl.ip when the drift term is equal to zero. 
Therefore, for fixed e, 5, we define the maximum likelihood estimator (MLE) of 9 to be 



(3.3) 



9" = argmaxggeZ^-p(A'T). 
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The presence of the small parameters e and 5 complicate the estimation of 6 significantly. Our 
approach is to find the limiting likelihood (in the appropriate sense) for each Regime i = 1,2,3, 
that is 

(3.4) Zl^Tm = liuiZlTiXT). 

Then, we prove consistency and derive asymptotic properties of the MLE 6^, by studying properties 
of the prelimiting log-likelihood Zqj,[Xt) and of the limiting log-likelihood Z'qj,[X'':). 

In particular, as we shall see in Section 14.11 based on the analysis of the log-likelihood function 
(j3.ip we prove that the MLE is a consistent and asymptotically unbiased estimator of the true 
value under Regime 1 with 6 = and Regimes 2 and 3. Under the same framework, we also 
prove, in Section [5l that the MLE 0*^ is asymptotically normal. 

On the other hand, as we shall see in Section 14.21 things get more complicated under Regime 
1 when 6 7^ 0. Li this case, the likelihood function (|3.ip does not necessarily have a well defined 
limit due to the terms that are multiplied hy e/5 (recall that in this case e/5 \ oo as e J, 0). We 
choose to resolve this issue, by taking the limit in an appropriately re-scaled and centered version 
of the original log-likelihood (a pseudo log- likelihood). Under certain conditions, this pseudo log- 
likelihood approach overcomes the convergence issue and a well defined limit exists. However, the 
pseudo maximum likelihood estimator is not assymptotically unbiased, even though the bias is 
explicitly characterized. 

The unbiased issue of the maximum likelihood estimation in the presence of "unbounded drift 

terms", such as the term | b ds with e/5 t oo is well known in the literature. In the 

context of e = 1 and (5 J, 0, which corresponds to Regime 1, the problem has also been studied in 

[21 [T6| |2T| [22] under different scenarios and conditions and it is shown there that the maximum 

likelihood estimator is not asymptotically unbiased and one may need to result in sub-sampling of 

the data at appropriate rates in order to produce unbiased estimators. In the case that has been 

studied in [3l [22] the issue was treated with appropriate sub-sampling of the data. The article |16] 

followed a semi-parametric approach assuming a special structure of the coefficients. In this work 

we do not address the unbiasedness issue. Nevertheless, we provide an explicit formula for the 

asymptotic error in the transformed log-likelihood function. Moreover, we apply our results to the 

case of small noise diffusion in a two-scale potential field, see Section [6l In this case, even though, 

the original estimator is not asymptotically unbiased, we can construct an asymptotically unbiased 

estimator and also derive a central limit theorem for the proposed estimator. 
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4. Limiting Likelihood 



We first study the limiting likelihood for Regime 1 when 6 = and for Regimes 2, 3 and then 
the proposed pseudo limiting likelihood for Regime 1 when 6 7^ 0. 

4.1. Limiting Likelihood for Regime 1 when b = and for Regimes 2, 3. In this section, 
we consider the limit of the likelihood function Z| rp^Xx), defined by (|3.ip for Regimes 1 when 6 = 
and for Regimes 2 and 3. 

Let us define the following functions 



Definition 4.1. For zt = {xt,^ < i < , x G W^,y G y and for the three possible Regimes 
i = 1, 2, 3 defined in il.^) . define 



T 



T 



T 



Zleo,Ti^-)= / {c0,C9o)aixs,y) f^lo(dy;Xs)ds - - / \\c0\\^{xs,y) fJ.l^^idy;Xs)ds, 
Jy ^ Jo Jy 

Zleo,Tiz-)= 1^ J^{lbe + ce,-fbeo + ce,,)^{xs,y)n'l^{dy;xs)ds-^ f f \\-fbe + C0\\l{xs,y)fi'i,,{dy;xs)ds, 

T 

3 



z, 



{ce,ceo)a {xs,y) f^ij{dy;xs)ds 



1 



lo Jy 

We then prove the following Theorem 



T 



Jy 



Jo Jy 

|2 „.\ .,3 



\ce\\aixs,y) i^e(My;xs)ds. 



Theorem 4.2. Let the assumptions of Theorem \ 2.6\ hold. Let Xt = {xj,0 < t < T} be a sample 
path of at 9 = 9q. In the case of Regime 1 we assume that bg = 0. Then, under Regime i = 

1,2,3, the sequence ^Zgrp^e > o| converges in Pg,, probability, uniformly in 6 £ Q to Zqq^j, (X*) 
from Definition \4.1\ and where XI is the solution to the corresponding limiting ODE from Theorem 
\2.6[ In particular, for any r] > 



(4.1) 



HmPe„ 
£4-0 



snp\ZlT{XT) - Zleo,T {^')\ > V 



Lastly, in each regime, the function Zg j, is maximized at 9 = 9q. 



Proof. Since Xt = {2;*, < t < T} is a sample path of (II. ip at 9 = 9q, we get that 



ZQrp(XT) 



T 



be + ce,dx, 



(4.2) 

where 

(4.3) 







X a 



T 



--be + ce 



Xs, — I ds 



^bg+Cg,^bg,,+Ceo) (x^, ^) (is - ^ 
0^0 / a \ 0/ 2 JQ 



be + eg 



x«, — I ds 



and 

(4.4) f/ = (ibs + ce,c7dWs)^{xs,'j 



Then standard averaging principle for locally periodic diffusions, see Chapter 3 of 0], and the fact 
that the corresponding invariant measure fig{dy;x) is continuous as a function of x and Theorem 
imply that for any p > 1 



(4.5) E ( 4'^ - ZUt {r) )* ^ 0, as e i 



p 



Moreover, the Burkholder-Davis-Gundy inequality [IS] applied to the stochastic integral \^el^''^ and 
Condition 12.11 imply 

(4.6) E sup 

0<t<T 

for some constant C > 0, uniformly in 9 £ Q. 

Thus, the proof of the claimed convergence follows by Chebyschev's inequality and the uniform 
convergence in 9 £ Q. The fact that the limit is maximized at 9 = 9o is easily seen to hold by 
completing the square in the expressions for Zg j, at Definition 14. li For example, in the case of 
Regime 1, it is easy to see that 

(4-7) ^0\,t(^-) = J / / WceoWl {xs, y) f^lo(^y;Xs)ds [ [ \\c0 - ceXaixs,y) fJ'loidy;xs)ds 
Jo Jy ^ Jo Jy 



and thus the maximum is easily seen to be attained at 9 = 9q. Similarly for Regimes 2 and 3. □ 

Before we continue, we need to impose the following identifiability condition for the true value 
of the parameter 9. 

Condition 4.3. For all r] > 0, 

sup {Z^^+„,r(r) - Z^^AX')} < -r? < 0. 

u:\u\>rj 

Theorem 4.4. Let 9'' = argmaxg^Q ZqAXt)- Under Condition \4-S\ and the assumptions of 
Theorem \4.S\ the MLE sequence |^'^,e > o| converges in Pgp probability to the true parameter. In 
particular, for any rj > we have 



(4.8) limPe„ 

£4-0 



0. 



Proof. For all 77 > 0, we have that 



> 7] 



< 



00 



sup {Zl^eoi^T)-ZI^{XT))>0 
W\>v 



< 



^00 



sup {{ZI^sMt) - ZI,{Xt)) - {ZUe, {X') " Zl, {X^))) 

\u\>ri 



\u\>ri 



Condition 14.31 gives that 



sup {{Zl^eoi^T) - ZI^{Xt)) - {Zl^e., (X^ - Z^, (X^)) 

|u|>J7 



>-sup (^O-^^o (^0) 

\u\>r] 



sup [[Zl^e,^XT) - Zi^s, (^0) - [Zk^^T) - Zi^ (X))) > 77 > 

|m|>»? 



Therefore, by conditioning on {\ZI^{Xt) - Z'g^ {X') \ > Irj} we have 
'-9o 



> r] 



< 



H>v 









> -r? > 
- 2 ' 


+ ^0 


\zi^-zi^\>\^>^ 



The result follows by compactness of and by the uniform convergence of Theorem 



□ 



4.2. Pseudo Limiting Likelihood for Regime 1 when b 7^ 0. In the case of Regime 1 with 
6 7^ 0, the situation is more involved because the limit of the log-likelihood Zq j-i^XT) by is not 
well defined. This is due to the tj^ and terms that appear in the expression of Z^ j.(^Yj'). This 

leads us to re-parameterize the log-likelihood, so that it will have a well defined limit. However, 
we need to re-parameterize the log-likelihood in such a way so that the limiting expression will 
coincide with the expression of Section 14.11 for 6 = and at the same time maintain tractability 
and simplicity. 

Let us denote by Z^ rpiXx; 0) the log-likelihood function ()3.ip with 6 = 0. We define the modified 
log-likelihood function 

(4.9) ZIAXt) = ZIt{Xt) + ZIt{Xt; 0). 

To characterize the limit, we first need to define several quantities. But first we impose an additional 
assumption. 
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Condition 4.5. Let the coefficients hQ,CQ and a he such that 

y 

for all 9,9o gQ and for all x G R'^. 



For example, Condition 14.51 is trivially satisfied under Condition 
are independent of y £ y (see also Remark 14.81 below) . 

Then, we can consider the auxiliary partial differential equation 



if the coefficients cg and a 



(4.10) 



^{x,y)^il ldy;x 



0. 



y 



Under Condition l4.5[ this Poisson equation has a unique bounded, periodic in y and smooth solution 
(see Theorem 3.3.4 of [3]). In order to emphasize the dependence of $ on 0, we shall often write 
^efio{x,y). 

Next, we define 



(4.11) 

and 

(4.12) 



T 

Jy 



T 

Jy 



+ 



T 

Jy 



{ce,ceo)a {xs,y) filo(dy]Xs)ds - - 



T 

Jy 



beWl {xs,y) iJ'loidy;xs)ds 
ceWl ixs,y) nl^^{dy;xs)ds 



Jy 



{ce„{xs,y),Vy^e,eo {xs,y)) iil^{dy;xs)ds. 



For the limiting distribution we prove the following theorem 

Theorem 4.6. Let Conditions \2. 1\ and \4^ hold and consider Regime 1. LetXx = {xt-,^ ^ t < T} 
be a sample path of atO = Oq. Then, the sequence e > o|, as defined by {4.3^ , converges 

ip probability, uniformly in 9 £ Q to Zq q^j, {X^), where 



(4.13) 

Ln particular, for any r/ > 



(4.14) 



£4,0 " 



sup 

.6»ee 



0. 



Before proceeding with the proof of the theorem, we make two remarks. 

Remark 4.7. When b = we get that the bias Hg^g^^{z.) = (since in this case ^{x,y) = 0), and 
we get back the result of Theorem \4-^ The term Jg q^_^{z.) is maximized at 9 = 0q as in Theorem 
4.2[ However, this is not true in general for Hefi^[z.). This implies that maximum likelihood in 
general fails for Regime 1 . 
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Remark 4.8. When Condition \4-5\ is not satisfied, the situation is more complicated. Using the 
modified log-likelihood ( [^.gp , Condition \4.5\ is necessary in order for j^. to have a solution. This 
follows by Fredholm alternative as in Theorem 3.3.4 of [5- The use of the Poisson equation Ii4-10 ) 
is an essential tool in the proof of Theorem \4.6\ There does not seem to be an obvious way to 
reparameterize the likelihood in such a way that it will have a well defined limit and at the same 
time maintain tractability. However, as we shall see in Section Theorem |^.6] covers one of the 
cases of interest which is the first order Langevin equation with a two scale potential. To be more 
precise, it covers the case of a small noise diffusion in two-scale potentials of the form with 
bg{x,y) = —VQeiy), cg{x,y) = — VVe(x) and a{x,y) = constant. 

Proof of Theorem \4.6\ After some term rearrangement, we get 



(4.15) 




We study the limiting behavior of the terms in the right hand side of M.lSh . It is relatively easy 
to see that the + (^)^ converges to zero in the p-th mean for every p > 1. Moreover, the 
quadratic variation of the stochastic integral in (j4.15p has a well defined limit in p-th mean, 
which together with the fact that it is multiplied by ^/€, gives us that this term on the right hand 
side of (j4.15p converges to zero in p-th mean. 

Therefore it remains to study the terms K^, and K^. By standard averaging principle for 
locally periodic diffusions, it can be seen that Kf -\- converges in Pe^ probability, uniformly in 
6* G e to J^g^iX}); see for example 
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Lastly, we need to study the term fi^l- ^^^^ purpose we apply Ito formula to ^(x,x/5) that 
satisfies (j4.10p with x = to get 



dt 



+ 



dt 



(4.16) 



+^ (V^$, adP^t) + (V,,$, adWt) . 



Hence, recalling that $ satisfies (|4.1U|) . which has a unique, periodic in y, bounded and smooth 
solution due to Condition 14.51 we obtain 



s{m-m) 



ds = -- C\^^lyXs/-f\ds 



+ 







Xs.^\ ds 



+^ej^ {Vy<^,adWs)(^Xs,^^+^e5 {V,^,adWs) (^X,,^^ 
(4.17) + j\ce,,Vy^) (^-^) ds. 

From this statement the result follows immediately since the last term j'^ {csq, Vj;$) [Xg, ^) ds 
converges in P^^ probability, uniformly in G G, to Hg^Q^^{X^). The rest of the terms on the 
right hand side of the last display converge to zero in Pg^ probability, uniformly m. 9 G 0, due 
to the boundedness of $ and its derivaitves and Condition 12. li This concludes the proof of the 
theorem. □ 



5. Central Limit Theorem for Regime 1 when 6 = and for Regimes 2 and 3 

In this section we state and prove a central limit theorem (CLT) for the maximum likelihood 
estimator 6'' of 6 in the case of Subsection 14. 1[ 

The main structural assumption is that under Regime 1 we have that 66)(x, y) = 0. For notational 
convenience and without loss of generality, we then consider that b0{x,y) = for all three regimes. 
For Regime 2 when bg ^ one essentially just replaces in the final formula, the function C0{x,y), 
by the function ^bg{x,y) + cg{x,y). So, without loss of generality, let us assume that b{x,y) = 
for all three regimes. 

13 



We define the normed log-likelihood ratio 



M^{u) = log- 



dFf, 



(x) 



e Jo ^ 



2e 



With ¥g probability 1 we can write that 



(5.1) M,(u) = -L £ (.„,^„ - e„, adW.)^ (x„ ^) - ^ f 



a \ 



For notational convenience, we also define the quantities 



S{0,x,y) = a ^{x,y)VeCe{x,y), 



(5.2) 



{x,e) 



S{9,x,y)S {9,x,y)fil{dy;x) 



y 



The Fisher information matrix is defined to be 



(5.3) 



W)= r q^{xle)ds. 

Jo 



We then have the following theorem. 

Theorem 5.1. Let the conditions of Theorem \2. 6\ hold. Consider Regime i = 1,2,3 and let 9'' he 
the maximum likelihood estimator of 9. With the notations above, assume that 

(i) The function cg{x,y) is twice continuously differentiahle in 9 with hounded derivatives. 

(ii) The Fisher information matrix Ii{9) is positive definite uniformly in 9 & Q. 

Then we have that in distribution under Fg, the following central limit result holds 



(5.4) 



Proof. For notational convenience we omit writing the subscript i, which denotes the particular 
regime under consideration. 

Let us denote 9^^u = 9 + (/){€, 9)ue, where (t){e,9) = y/eI-^/'^{9). We assume that ^ belongs in 
a compact subset of 0, denoted by 0, and let — )• u as e J, 0. We start by rewriting the normed 
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likelihood ratio as follows 



1 



ds 



X 



I r \l -colli {-s, ^) - {r'/\e,u)u.,g'/\x.,i 



2 

(5.5) = ji{e) + jne) + 4(6) + Ji 



The last line of the previous computation is easily seen to hold by the following chain of identities 

(5.6) £ {v, S{e, xs^y)f Mdy, x,)ds = £ (v, q^l\xs, 6))'' ds = {I{e)v, v). 

which are applied for v = I~^^'^(6)u^. 

The goal is to prove that M^{u) = (u, $) — i ||n||^ + R{e,9), where <1> is distributed as normal 
A^(0, /) and -R(e, 9) — t- as e J, in probability uniformly in ^ G 0. This, will establish that 
the family {Pg : 9 € 0} is uniformly asymptotically normal with normalizing matrix (f){e, 9) = 
y/el~^^^{9), which then proves the claimed asymptotic normality of the theorem (see Theorem 1.6 
in Chapter 1 of [19]). 

It is clear that 

1 1 2 

(5.7) Jl = --{ue,Ue) ^ --\\u\\ ,aseiO. 

Moreover, due to averaging and the law of large numbers result Theorem \2.lj\ the definition of 
the Fisher information matrix I{9) implies that 

X a 



Xs, ^ 



T , 

' ' X, 



(5.8) = [l-^l''{9,^^)u,, (5 Xs, -f) , dWs 

converges in distribution with respect to ¥g, uniformly in 9 £ Q, to (u,^) where $ is distributed 
as iV(0,/), as e i 0. 

Thus it remains to consider the term R{e,9) = Ji{9) + J^{9). We shall show that both terms 
converge to zero in Pg probability as e J, 0, uniformly in ^ G . 
We start by observing that 

(5.9) ce+i -ce= {£, Vece+ih) dh. 

Jo 
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Then we can write 

cT 



E 



E 



< |/"^/2(6',,„)u,|2sup sup E 
e6e|f|<Cv^ 



T 



V0C0^,„+,; - VeC0^ „ [xs, -J ] ds 



(5.10) <Csup sup E 



T 



1 1 2 / 

V0C0^_„+^, - Vece,J\^ i^Xs, -j ) ds 



0, as e J, 0. 



The last convergence is true due to the uniform continuity of Vgce in 6 G @ and tightness of 
{X^,e > 0}. Using Ito isometry, the last display implies that 



(5.11) 



supE I Ji"(6l)r ^0, as eiO. 
6»ee 



Lastly, it remains to consider the term J^{9). Notice that standard averaging principle and the 
convergence of X'' to X as e ^ by Theorem 12.61 imply that, 
(5.12) 



E 



T r 



ds 



0, as e I 0. 



The assumptions on the dependence on 9 guarantee that this convergence holds uniformly in G 0. 
Then by (IET0]) - ([5T2] ) we obtain that 



(5.13) 

Therefore, we have obtained that 



supE|J|(6l)p ^ 0, ase|0. 

See 



(5.14) 



supE \R{e, ( 



0, as e I 0. 



This establishes that the family {P^ : 6 G 0} is uniformly asymptotically normal with normalizing 
matrix (j){e, 9) = y/eI^^/'^{9), which concludes the proof of the theorem. 

□ 



6. First Order Langevin Equation 
A particular model of interest is the first order Langevin equation 

3.1) 



dXl = -VVq' ( XI dt + yfeV2DdW{t), = xq, 
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where is some potential function and 2D the diffusion constant. We are particularly interested 
in the case where the potential function is composed of a large-scale smooth part and a fast 
oscillating part of smaller magnitude: 



(6.2) 



Vi{x,x/5) =€Q{x/6) + eV{x). 



Thus the equation of interest can be written as 

(6.3) dx^, = \--^vQ 3] - evv{xt] 



dt + ^eVwdWt, X^ = xo, 



An example of such a potential is given in Figure 1. 




Figure 1. V'{x, f ) = Ve{x) + eQ(f ) with Ve{x) = ^x\ Q(f ) = cos(f ) + sin(f ) 
and parameters e = 0.1, 6 = 0.01 and 9=1. 

For the potential function drawn in Figure El the unkown parameter 9 corresponds to the cur- 
vature of V{x) around the equilibrium point. 

We are interested in the statistical estimation problem for the parameter 9 in the case of Regime 
1, i.e., when e/6 t oo. In Subsection 16.11 we study the estimation problem for 9 based on the 
methodology described in Subsection 14.21 In Subsection 16.21 we study the corresponding central 
limit theorem. In Subsection 16.31 we present a simulation study. 

6.1. Pseudo Limiting Likelihood and Proposed Estimator for 9. To connect to our notation 
let bo{x,y) = —VQ{y), cg{x,y) = —9'VV{x), a{x,y) = \/2DI and we consider Regime 1. In this 
case there is an explicit formula for the invariant density fJ,{y), which is the Gibbs distribution 

1 Q(y'i f Q(y) 

(6.4) ^i{dy) = -e d dy, Z = e d dy. 

^ Jy 
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Moreover, it is easy to see that the centering Conditions 12.21 and 14.51 hold. Notice that in this case 
the invariant measure does not depend neither on x G M'^, nor on € R. We also define 

(6.5) Z = [ e^dy. 

We have the following proposition. 



y 



Proposition 6.1. Under the conditions and notation of Theorem \4-6\ we have that the error term 
is given by 

(6.6) He,e,{z.) = £ (vy(x.), ^^f^idy^j Vy(x.)^ ds. 

Proof. In the case b0{x,y) = —VQ{y), C0{x,y) = —OW{x) and a{x,y) = V^DI, we notice that 
the solution ^ to the Poisson equation ()4.10p is related to the solution of the cell problem (|2.ip . 
via the relation 

(6.7) ^e(^,2/) = -^^(x(2/),Vy(x)). 
Hence, we have that 



This concludes the proof of the proposition. □ 

When we have a separable fluctuating part, i.e. Q{yi,y2, ■ ■ ■ , yd) = Qi(yi)+Q2(y2) + - • ■+Qd{yd), 
everything can be calculated explicitly. We summarize the results in the following corollary. This 
corollary also shows that in this case we can derive an asymptotically unbiased estimator 9q in 
closed form. 

Corollary 6.2. Assume Q{yi,y2, ■ ■ ■ ,yd) = Qiivi) + Q2{y2) + • • • + QdiVd) o,nd consider Regime 
1. Under the conditions and notation of Theorem \4-6\ we have that the error term is given by 

(6.8) He,eoiz.) = (Vy(x,), (-/ + A^r) Vy(x,)> ds, 

where X > is the (common) period of the functions Qi in the corresponding direction, 

diag 



ZiZi ZdZd. 
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and for i = 1,2, ... ,d 



Zi = I e D dyi, Zi = I e d dyt. 



Moreover, we have that Hq^0q{z.) < 0. 

Furthermore, recall the MLE 4- // \\W{xs)\\\ ds ^ for both K = I and K = T'^ , then 

\ !^\\yv{xM]ds I 

converges in P^g probability to Oq, i.e., 9^ is an asymptotically unbiased estimator of Oq. 
Proof. The separability assumption of Q{y) gives us 



Plugging that into (I6.6p we immediately get the simplified representation of the error term. 

The second claim follows from Holder inequality. Indeed, it is easy to see that 1^ < 1- Therefore, 
we obtain Hg^g^^{z.) < 0. 

Next, we maximize the limiting log-likelihood function. By straightforward substitution to ()4.1ip 
we see that 

4,eo,Ti^-) = jy 11^^ {y)\\lDii^idy) + ^ [ee, - \e^^ £ \\vv ix,)fjds. 

We collect things together and write 

zleM^-) = jy ^y)\\lDi (-^^' [ \Wy{xs)\\] ds + ee^x^ £ ||vy(x,)||^, ds 

Then, it is easy to see that this quantity is maximized for 
(6.9) ^ = ,„A=£>r(fl%?£. 

Then, using Theorem 14.61 we obtain the statement of the theorem. □ 

6.2. Central Limit Theorem for the pseudo MLE. In this section, we prove a central limit 
for the maximum likelihood estimator of the first order Langevin equation ()6.3p . 

Based on the modified log likelihood function (i.e., on (|4.15p ). the maximum likelihood estimator 
can be written 

2 \ rT 



+ 1 / WVVixsWds 



(6.10) -V2DV~e ' + £ {VV{xs),dWs) + (^(^^' + 



\\VV{xs)fds\. 
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Some algebra manipulation in (j6.10p gives us 

^\ ° ((!)' + i)/o^||vy(..)fd.; J^wvvixM'ds- 

We have the following theorem 

Theorem 6.3. Assume Condition \2.1\ Consider the first order Langevin equation \6. 3|) and assume 
Regime 1 . Let 9'' be the maximum likelihood estimator of 9q based on the modified log likelihood 
function Zgj,. Then, we have that in distribution under "^q^^, the following central limit result holds 

^ _ _ f£<vn. ).vc;(t)).. \ ^,L,^ir ||v.(xi)f .X') ^ 

Moreover, assuming Q{yi,y2, ■ ■ ■ ,yd) = Qiivi) + '32(2/2) + • • • + QdiVd), we also have that in P^g 
probability 

' * - ^((f)'^i)/-iiv>.(.,)f*j - !i\mxit,^s ■ 

which is l[6.9\). 



Proof. The first statement follows directly from the representation of the maximum likelihood 
estimator in (|6.11|) and the central limit theorem for stochastic integrals, see for example Lemma 
1.8 in Chapter I of [IS]. 

The second statement is as follows. Consider the unique, bounded and periodic in y smooth 
solution of the auxiliary problem 

(6.14) ClMx,y) = -{VV{x),VQ{y)), [ ^x,y)fi{dy) = 0. 

Jy 

By applying Ito formula to <I>(x,y), (compare with (|4.16|) and (j4.17p ) we get 

(6.15) ^J^ (vvixs),VQ(^'))ds = -OoJ^ (vVix,),Vy<i>(^Xs,^))ds + Opil). 

where the term Op(l) converges to zero in probability as e,5 4, 0. Therefore, by substituting we 
obtain that 

(61g) IIo{VVixs),VQ{f))ds _ eoj;[{VVix,),Vy<^{x,,f))ds ^ ^ 

{{l)' + l)lo\NV{x,)fds ((f)' + l)/,^||Vy(x.)f 

Then, as in Proposition 16. II and Corollarv l6.2l we can solve the auxiliary PDE (|6.14|) in closed form 

and obtain the statement of the theorem. □ 
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6.3. A Simulation Study. We apply our results in the case when V''{x, |) = e (cos(|) + sin(|)) + 
and V{x) = ^x^. 

As we discussed in the previous sections, in this case we need to work with the modified likelihood, 
since h ^ Q. As we proved in Proposition 6.1 due to the separability of Q, we can obtain an 
asymptotically unbiased estimator when properly normalized. 

We start by simulating the model. We use an Euler discretization scheme for the multiscale 
diffusion as follows 



x: 



cos -sin -eXlYtu+i-tk) + Ve^{Wt,^,-Wt,), 



where A; = 1, . . . , n, n is the number of simulated values. For the simulated data we choose e = 0.1 
and 5 = 0.01. 

From [8], we have that the Euler scheme is bounded above by Ae/J^, where we denote by A 
the discretization step. Therefore, if we want an error of order 0.001, we need to choose the 
discretization step A to be equal to 0.0015^ /e. For the simulation procedure, we choose A = 10~^ 
and n = 10^. 

The maximum likelihood procedure consists of constructing the pseudo log-likelihood function 
(4.9). More specifically. 



ZIt{Xt) + ZIAXt-.^) 



W{xs)dxs 1 + 



+ 



T 



VQ 



W{xs)ds 



+ const, 



where a = \/2D and const is a quantity that is independent of the parameter 6. The maximizer of 
this quantity computes as 



/o' VV{xs)dxs ( 1 + (!)') + ! /o VQ (^) VV{xs)ds 



5 t-T, 



-j^{VV{xsYds)[[^' + l] 

Although our model is continuous as well as our MLE, in practice we obtain data in discrete 
time. Therefore, we need to discretize our estimator in order to implement it. We directly discretize 
the stochastic integrals and we obtain 





+ 7 VQ (^) vy(x,j(5i+i - s,)" 


-E.=l'(VF(^sJ2(.,+i-.,))((|)' + i) 
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The asymptotically unbiased estimator will be the normalized Omax ■ The normalizing term equals 
(Ar)^, with A and F as defined in Corollary ()6.2p . 

Remark 6.4. It is important to mention here that we do not simplify the stochastic integral using 
ltd 's lemma. The reason is that in order to compute the estimator for 6 we need to use just the 
observations we have available. If we use ltd, then the integral with respect to Brownian motion 
that appears contains a process (the Brownian motion) that is not observed. 

Using simulated data, we construct the MLE for different values of the true parameter 0. The 
results are summarized in Table [6?3l along with the corresponding 68% and 95% confidence intervals. 
These are both empirical intervals meaning that we repeat the procedure (simulation - estimation) 
several times (M=100). Then, we obtain the Monte Carlo estimator for 6 as the average of all 
estimators, as well as the Monte Carlo standard deviation that we use in the construction of the 
intervals. 



True Value 


Estimator 


68% Confidence Interval 


95% Confidence Interval 


1 


1.042 


( -0.0329, 2.118) 


(-1.065, 3.150) 


2 


1.970 


( 0.1827, 3.758) 


(-1.533, 5.473) 


0.1 


0.103 


( 0.0125, 0.1928) 


(-0.0739, 0.2795) 



Table 1. Estimated values of and the corresponding empirical 68% and 95% 



confidence intervals for various true parameters 9. 

For 9 = 1, we plot (Figure [6.3p the histogram of the empirical distribution that we obtain from 
the Monte Carlo procedure. For comparison, on the same graph we also plot the corresponding 
density curve of the theoretical asymptotic (Normal) distribution with the appropriate variance as 
the one we computed in Theorem 6.3. 

7. Conclusions 

In this paper we studied the parameter estimation problem for diffusion processes with multiple 
scales and vanishing noise. Under certain conditions, we derived consistent estimators and proved 
the related central limit theorems. The theoretical results are supported by a simulation study of 
the first order Langevin equation in a rough potential. Such results are useful when one is interested 
in parameter estimation of dynamical systems with more than one scales (e.g., in rough potentials) 
perturbed by small noise. 

22 



Figure 2. Histogram of the estimator Omax for the simulated dataset and the cor- 
responding density (theoretical) curve from Theorem 6.3. 
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